AITopics | convergence and asymptotic error scaling

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Neural Information Processing SystemsNov-20-2025, 21:51:36 GMT

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n^{-1})$. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale $O(n^{-1})$. These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.

convergence and asymptotic error scaling, long time convergence, neural network, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Grant Rotskoff, Eric Vanden-Eijnden

Neural Information Processing SystemsNov-20-2025, 14:47:06 GMT

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Neural Information Processing SystemsOct-7-2024, 05:57:27 GMT

If so, I am confused why this is highlighted as a virtue of adding noise, since the purely deterministic dynamics of GD also evince this behavior. Numerical experiments: These are slightly hard to interpret. First, which plots show SGD dynamics, and which are for GD? Second, I'm puzzled by how to interpret the dotted lines in each plot. In the case of RBF, how are we to make sense of the empirical n {-2} decay? Is this somehow predicted in the analysis of the GD, or is it an empirical phenomenon which is not theoretically addressed in this work.

convergence and asymptotic error scaling, long time convergence, neural network, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Rotskoff, Grant, Vanden-Eijnden, Eric

Neural Information Processing SystemsFeb-14-2020, 19:26:05 GMT

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n {-1})$. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning.

convergence and asymptotic error scaling, long time convergence, neural network, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Rotskoff, Grant, Vanden-Eijnden, Eric

Neural Information Processing SystemsDec-31-2018

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n^{-1})$. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale $O(n^{-1})$. These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.05)
North America > United States > New York (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Rotskoff, Grant, Vanden-Eijnden, Eric

Neural Information Processing SystemsDec-31-2018

The performance of neural networks on high-dimensional data distributions suggests that it may be possible to parameterize a representation of a given high-dimensional function with controllably small errors, potentially outperforming standard interpolation methods. We demonstrate, both theoretically and numerically, that this is indeed the case. We map the parameters of a neural network to a system of particles relaxing with an interaction potential determined by the loss function. We show that in the limit that the number of parameters $n$ is large, the landscape of the mean-squared error becomes convex and the representation error in the function scales as $O(n^{-1})$. In this limit, we prove a dynamical variant of the universal approximation theorem showing that the optimal representation can be attained by stochastic gradient descent, the algorithm ubiquitously used for parameter optimization in machine learning. In the asymptotic regime, we study the fluctuations around the optimal representation and show that they arise at a scale $O(n^{-1})$. These fluctuations in the landscape identify the natural scale for the noise in stochastic gradient descent. Our results apply to both single and multi-layer neural networks, as well as standard kernel methods like radial basis functions.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Collaborating Authors

convergence and asymptotic error scaling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Reviews: Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks